fsutil: add NFS soft-mount options to prevent kernel panic on hot-unplug by 0-danielviktorovich-0 · Pull Request #149 · nohajc/anylinuxfs

0-danielviktorovich-0 · 2026-05-19T07:52:39Z

Summary

When user physically disconnects USB-C cable from an anylinuxfs-managed device without running anylinuxfs unmount first, the macOS NFS client (default hard-mount semantics) retries indefinitely against the now-unreachable NFS server inside libkrun. The kernel holds IOMediaBSDClient in busy state until watchdogd triggers panic(busy timeout[1]) after 60s.

This PR adds soft,timeo=100,retrans=3 to the default macOS NFS mount options so the kernel returns EIO after ~30s instead of hanging forever when the underlying VM/disk is gone.

Reproduction

Reproduced 3 times over 8 days on Mac16,8 / M4 Pro with macOS 26.4.1 → 26.5 (panic persisted across OS update, confirming the bug is in our integration rather than macOS itself).

Identical signature in all three panic-full-*.panic files:

panic(cpu N caller 0x...): busy timeout[1], (60s):
'IOMediaBSDClient' (1,1812001) @IOService.cpp:5986

Panicked task ...: pid <N>: watchdogd
last started kext: com.apple.iokit.SCSITaskUserClient 545.100.10

Reproduction steps:

Mount: sudo anylinuxfs mount /dev/disk5s1 -o noatime,compress=zstd:3
Wait for mount at /Volumes/<label>
Physically disconnect USB-C (without anylinuxfs unmount)
Anywhere from seconds to hours later, kernel panic — triggered by any background process touching the dead mount (Spotlight reindex, Time Machine attempt, mds_stores, etc.)

Why current `deadtimeout=45` is insufficient

deadtimeout=45 (added in fsutil.rs:113) helps Finder's manual eject path — Finder force-unmounts after 45s of unresponsive RPC. But it only counts toward force-unmount once Finder decides the mount is dead.

For scheduled background I/O (Spotlight, Time Machine, mds_stores, polling daemons), the kernel keeps retrying NFS RPCs indefinitely (hard-mount default), and IOMediaBSDClient stays busy → kernel watchdog fires at 60s before deadtimeout resolves anything.

Comment at fsutil.rs:204-206:

/// macOS relies on DiskArbitration teardown — no-op.

This is an incorrect assumption — DARegisterDiskDisappearedCallback is only registered inside synchronous EventSession::wait_for_unmount (diskutil/darwin.rs:353-378). After the CLI exits, no run loop is running, so no callback fires on unexpected disconnect.

What this PR changes

anylinuxfs/src/fsutil.rs — NfsOptions::default() on macOS now also inserts:

opts.insert("soft".into(), "".into());
opts.insert("timeo".into(), "100".into());  // tenths of a second → 10s per try
opts.insert("retrans".into(), "3".into());

Combined with existing deadtimeout=45, this provides defense-in-depth against hot-unplug:

Background process retries: bounded to ~30s, returns EIO
Finder eject: works as before (deadtimeout=45)
Kernel watchdogd: never triggers because IOMediaBSDClient is released within 30s

Trade-offs

The trade-off of soft vs hard mount:

Pro: External media disconnect → graceful EIO instead of indefinite hang / kernel panic
Con: Transient NFS lag (e.g. 1-2s during VM cleanup phase) could potentially return EIO

I think this trade-off is appropriate for anylinuxfs's use case — these are external removable media, not always-online network drives. Operations against a phantom mount should fail clearly.

If you'd prefer this gated behind a CLI flag (--soft-mount) for opt-in, happy to refactor.

Testing

✅ cargo build -F freebsd passes
✅ ./run-rust-tests.sh passes (41 tests)
✅ Added regression test fsutil::tests::default_nfs_opts_include_soft_mount_semantics to lock in the new defaults — fails if any of soft, timeo=100, retrans=3, or existing deadtimeout=45 / vers=3 is removed
cargo fmt applied

What this PR does NOT address (future work)

There's a deeper architectural issue: even with soft mount, the proper fix is a persistent DiskArbitration listener that automatically triggers graceful unmount on disk-disappeared events for managed disks. The existing DARegisterDiskDisappearedCallback machinery in diskutil/darwin.rs is already imported — it just needs to live outside the synchronous CLI flow.

I have a working external Python+pyobjc implementation as a local LaunchAgent that does this — DARegisterDiskDisappearedCallback in a persistent process triggers cleanup within ~100ms of physical disconnect. Three architectural approaches I considered for upstreaming it: (1) new long-lived daemon as LaunchAgent, (2) listener thread inside existing per-mount supervisor, (3) detect disk-removed event inside libkrun guest (vmproxy) and notify host via existing TCP control socket port 7350. Happy to submit a follow-up PR with design discussion if you're open to direction.

But this PR is intentionally scoped small — it's a low-risk, immediate-impact fix that covers the primary failure mode (hot-unplug → kernel panic via background I/O) using the existing NFS option mechanism. The structural fix can come later.

Maintainer feedback questions

Are NFS soft-mount defaults acceptable for macOS, or would you prefer opt-in via CLI flag?
Is the regression test style/location appropriate, or would you prefer it elsewhere?
Would a separate issue for tracking the broader DiskArbitration listener work be useful, or shall we discuss here?

Thanks for anylinuxfs — it's the cleanest way I've found to read btrfs on Mac without macFUSE/SIP compromises. Hoping to help make it production-grade for external removable media.

When user physically disconnects USB-C cable from an anylinuxfs-managed device without running `anylinuxfs unmount` first, macOS NFS client (default hard mount semantics) retries indefinitely against the now-unreachable NFS server inside libkrun. The kernel holds `IOMediaBSDClient` in busy state until `watchdogd` triggers `panic(busy timeout[1])` after 60s. Reproduced 3 times over 8 days on Mac16,8 / M4 Pro with identical signature in `/Library/Logs/DiagnosticReports/panic-full-*.panic`: panic(cpu N): busy timeout[1], (60s): 'IOMediaBSDClient' (1,1812001) @IOService.cpp:5986 Panicked task ... pid <N>: watchdogd last started kext: com.apple.iokit.SCSITaskUserClient The existing `deadtimeout=45` option supports Finder's manual eject path but does not cover scheduled background I/O (Spotlight reindex, Time Machine attempts, mds_stores, daemon polling) that hits the dead mount after hot-unplug. macOS does not auto-teardown NFS mounts on physical disconnect — `DiskArbitration` only fires callbacks for registered listeners, which we don't have outside synchronous CLI flow (see fsutil.rs comment near line 206 acknowledging the gap). Soft-mount semantics with bounded timeouts return EIO after ~30s (3 retries × 10s `timeo`) instead of holding the registry busy. Returning EIO is appropriate when the physical device is gone — operations that would have hung forever now produce a meaningful error and the kernel releases the IOKit entry. Includes regression test in `fsutil::tests::default_nfs_opts_include_soft_mount_semantics`. Discussed in GitHub issue (to be filed alongside this PR).

gemini-code-assist

Code Review

This pull request updates the default NFS mount options for macOS in anylinuxfs/src/fsutil.rs to include soft, timeo=100, and retrans=3. These changes bound kernel-level retries when the microVM becomes unreachable, preventing potential kernel panics caused by indefinite retries. A regression test was also added to ensure these options remain in the default configuration. I have no feedback to provide.

nohajc · 2026-05-19T15:31:36Z

Thank you for the pull request. Soft mount sounds reasonable. I'm going to review the change.

Just note that your AI agent got some of the analysis wrong. For example the run loop in wait_for_unmount absolutely does work after the CLI has exited (it forks and continues in background).

However, it conflates disk arbitration events which track NFS mount and the underlying disk. There is currently no tracking for the latter.

Anyway, in your own words, did the change help to resolve your issue?

nohajc · 2026-05-19T15:36:56Z

As for any further improvements in this direction, I would prefer not to involve LaunchAgent. There is already one process running in the background which monitors the virtual machine and the NFS eject event. It could be extended to also watch for the disk being disconnected.

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fsutil: add NFS soft-mount options to prevent kernel panic on hot-unplug#149

fsutil: add NFS soft-mount options to prevent kernel panic on hot-unplug#149
0-danielviktorovich-0 wants to merge 1 commit into
nohajc:mainfrom
0-danielviktorovich-0:fix/hot-unplug-soft-mount

0-danielviktorovich-0 commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

nohajc commented May 19, 2026 •

edited

Loading

Uh oh!

nohajc commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

0-danielviktorovich-0 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Why current deadtimeout=45 is insufficient

What this PR changes

Trade-offs

Testing

What this PR does NOT address (future work)

Maintainer feedback questions

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

nohajc commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nohajc commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0-danielviktorovich-0 commented May 19, 2026 •

edited

Loading

Why current `deadtimeout=45` is insufficient

nohajc commented May 19, 2026 •

edited

Loading